Cross-Matching Multiple Spatial Observations and Dealing with Missing Data

نویسندگان

  • Jim Gray
  • Alexander S. Szalay
  • Tamas Budavari
  • Robert Lupton
  • María A. Nieto-Santisteban
  • Ani Thakar
چکیده

Cross-match spatially clusters and organizes several astronomical point-source measurements from one or more surveys. Ideally, each object would be found in each survey. Unfortunately, the observation conditions and the objects themselves change continually. Even some stationary objects are missing in some observations; sometimes objects have a variable light flux and sometimes the seeing is worse. In most cases we are faced with a substantial number of differences in object detections between surveys and between observations taken at different times within the same survey or instrument. Dealing with such missing observations is a difficult problem. The first step is to classify misses as ephemeral – when the object moved or simply disappeared, masked – when noise hid or corrupted the object observation, or edge – when the object was near the edge of the observational field. This classification and a spatial library to represent and manipulate observational footprints help construct a Match table recording both hits and misses. Transitive closure clusters friends-of-friends into object bundles. The bundle summary statistics are recorded in a Bundle table. This design is an evolution of the Sloan Digital Sky Survey cross-match design that compared overlapping observations taken at different times. 1. Terminology: Hits, Misses, Ephemeral, Masked, Edge Given several observations of the sky, called runs, astronomers often want to cross-match all the observations of each object from all runs that observed that object. A typical first step is to process the runs to make an object catalog. The catalog entries typically take the form: (runID, objectID, position, positionError, other attributes...) Two objects are said to match if they come from different runs and if their positions differ by less than their classification distance. Picking the classification distance depends on the data and on the intended use of the cross-match. If only stationary objects are to be matched, then the classification distance can be a small multiple of the maximum of the two object’s circular rms position errors. The position uncertainty or astrometric precision is often a constant for all objects of an observation, but when comparing data from different instruments or from times with different seeing, the position uncertainties may differ. Various systematic effects can add to uncertainties. A rigorous statistical argument, based on mean density and other parameters can recommend an optimal Bayes classification distance. Given a point in one run, the probability in finding another point at a separation r in another run, given perfect accuracy is the sum of a Dirac delta for the object plus the contribution from a spatial correlation function (from clustering) and a random Poisson component. The observational errors, motions, and sizes all create their own errors, which must be convolved with this distribution. These convolutions will broaden the Dirac delta. At the same time there are inevitable false detections and chance overlays. We want a classification distance that minimizes the overall error (i.e. false positives and false negatives.) Ideally one could use a Bayes decision criterion, but the object surface density is not uniform on the sky. Some studies are interested in moving objects and other studies are working with data collected over an epoch where the earth’s observational position affects the object’s relative position. In those cases the object’s apparent movement may exceed the positional error, and therefore a larger threshold is needed for the match criterion. The technique described here can handle slowmoving objects – where the relative motion during the observational epoch is small compared to the average distance among objects. We return to that issue in Section 5, but for now assume that we only intend to cross-match stationary objects. For example, SDSS Data Release 5 [6] chose a classification distance of 1.0”. The survey has an astrometric precision of 0.1” and an average inter-object distance of 21”; but it chose the high classification distance, 10x the astrometric precision, to include slowlymoving objects in the cross-match. If the SDSS were in the galactic plane, not to mention the galactic center, it would have very crowded fields, and would have a combinatorial explosion using such a large classification distance. In what follows we assume that the study has selected a classification distance function: ClassificationDistance( positionError1 , positionError2).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

چند رویکرد برخورد با مقادیر گمشده‌ متغیرهای کمی و بررسی اثر آنها بر نتایج حاصل از یک کارآزمایی‌ بالینی

Background and Objectives: A major challenge that affects the longitudinal studies is the problem of missing data. Missing in the data may result in the loss of part of the information which reduces the accuracy of the estimator and obtain the results will be biased and inaccurate. Therefore, it is necessary to evaluate the missing data mechanism from a longitudinal research and to consider thi...

متن کامل

تحلیل درستنمایی ماکزیمم مدل رگرسیون لجستیک در حالتی که داده های متغیرهای پیشگو کامل نیستند ولی متغیرهای کمکی وجود دارند

Background and Objectives: Missing data exist in many studies, e.g. in regression models, and they decrease the model's efficacy. Many methods have been suggested for handling incomplete data: they have generally focused on missing outcome values. But covariate values can also be missing.Materials and Methods: In this paper we study the missing imputation by the EM algorithm and auxiliary varia...

متن کامل

Handling missing Mini-Mental State Examination (MMSE) values: Results from a cross-sectional long-term-care study

BACKGROUND Missing values are commonly encountered on the Mini Mental State Examination (MMSE), particularly when administered to frail older people. This presents challenges for MMSE scoring in research settings. We sought to describe missingness in MMSEs administered in long-term-care facilities (LTCF) and to compare and contrast approaches to dealing with missing items. METHODS As part of ...

متن کامل

Multiple valued logic approach for matching patient records in multiple databases

Many problems arise when linking medical records from multiple databases. Matching these data to other data is problematic since even small errors, such as data entry errors, different text format, and missing data, can prevent the exact-match algorithms. Evidence from previous studies suggested that approximate field matching represent a solution to resolve the problem by identifying equivalen...

متن کامل

A case study on the use of multiple imputation.

Multiple imputation is a relatively new technique for dealing with missing values on items from survey data. Rather than deleting observations for which a value is missing, or assigning a single value to incomplete observations, one replaces each missing item with two or more values. Inferences then can be made with the complete data set. This paper presents an application of multiple imputatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/cs/0701172  شماره 

صفحات  -

تاریخ انتشار 2006